Evaluation of a Vector Space Similarity Measure in a Multilingual Framework
نویسندگان
چکیده
In this contribution, we propose a method that uses a multilingual framework to validate the relevance of the notion of vector based semantic similarity between texts. The goal is to verify that vector based semantic similarities can be reliably transfered from one language to another. More precisely, the idea is to test whether the relative positions of documents in a vector space associated with a given source language are close to the ones of their translations in the vector space associated with the target language. The experiments, carried out with both the standard Vector Space model and the more advanced DSIR model, have given very promising results.
منابع مشابه
A new vector valued similarity measure for intuitionistic fuzzy sets based on OWA operators
Plenty of researches have been carried out, focusing on the measures of distance, similarity, and correlation between intuitionistic fuzzy sets (IFSs).However, most of them are single-valued measures and lack of potential for efficiency validation.In this paper, a new vector valued similarity measure for IFSs is proposed based on OWA operators.The vector is defined as a two-tuple consisting of ...
متن کاملMultilingual document clusters discovery
Cross Language Information Retrieval community has brought up search engines over multilingual corpora, and multilingual text categorization systems. In this paper, we focus on the multilingual clusters discovery problem, which aim is to extract topic-related multilingual document clusters from a multilingual document collection in an unsupervised way. Our approach is based on a linguistic anal...
متن کاملUsing Parallel Corpora to enrich Multilingual Lexical Resources
This paper describes the use of a bilingual vector model for the automatic discovery of German translations of English terms. The model is built by analysing co-occurence patterns in a parallel corpus of English and German medical abstracts, a method also used for CrossLingual Information Retrieval. The model generates candidate German translations of English words using the cosine similarity m...
متن کاملKey-phrase Extraction for Classification
In this paper we consider the problem of extracting key-phrases from a bilingual texts collection and using them for text classification. A key-phrase could be defined as a sequence of words of a given size in a given partial order that occur within a sentence. We describe an algorithm for the discovery of key-phrases. Then, a framework of handling multilingual texts / documents is described wh...
متن کاملJudgment Language Matters: Multilingual Vector Space Models for Judgment Language Aware Lexical Semantics
A common evaluation practice in the vector space modeling (VSM) literature is to measure models’ ability to predict human judgments about lexical semantic relations between word pairs. Most existing evaluation sets, however, consist of scores collected for English word pairs only, ignoring the potential impact of the judgment language in which word pairs are presented on the human scores. In th...
متن کامل